Executive Summary

This report describes the process behind the creation of a Machine Learning Model used to classify weight lifting exercise (unilateral dumbbell biceps curling) in classes:

More about the research and data used can be found on the following website: http://groupware.les.inf.puc-rio.br/har#weight_lifting_exercises#ixzz6CzLP0YxO

TODO: RESULTADO DO MODELO

Exploratory Data Analysis

Our data source urls:

TRAINING_SOURCE_FILE_URL <- "https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"
TESTING_SOURCE_FILE_URL <- "https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"

Loading and splitting the data (training, validating and testing):

NA_STRINGS <- c("NA","#DIV/0!")
training <- read.csv(TRAINING_FILE_PATH, na.strings = NA_STRINGS)
testing <- read.csv(TESTING_FILE_PATH, na.strings = NA_STRINGS)
in.training <- createDataPartition(y = training$class, p = 0.7, list = FALSE)
validating <- training[-in.training, ]
training <- training[in.training, ]

How our training dataset looks like:

dim(training)
## [1] 13737   160
print(table(training$class))
## 
##    A    B    C    D    E 
## 3906 2658 2396 2252 2525

Checking the presence of NAs per variable:

na.stats
## 
## (-0.001,0.05]      (0.95,1] 
##            60           100

So, for a large number of variables, they have 95% of more of NAs. These variables will be ignored our our models.

Outliers.

You can see more details about the training dataset on the Appendix.

Models

Remove unecessary columns

NZV

Outlier removal.

Normalization?

Models

Model selection

Accuracy and Residual Analsysis

Prediction

Conclusion

Appendix

Variables there are being ignored:

unwanted.columns
## [1] "X"                    "raw_timestamp_part_1" "raw_timestamp_part_2"
## [4] "cvtd_timestamp"       "num_window"
almost.empty.columns
##   [1] "kurtosis_yaw_belt"        "skewness_yaw_belt"       
##   [3] "kurtosis_yaw_dumbbell"    "skewness_yaw_dumbbell"   
##   [5] "kurtosis_yaw_forearm"     "skewness_yaw_forearm"    
##   [7] "kurtosis_picth_forearm"   "skewness_pitch_forearm"  
##   [9] "kurtosis_roll_forearm"    "skewness_roll_forearm"   
##  [11] "max_yaw_forearm"          "min_yaw_forearm"         
##  [13] "amplitude_yaw_forearm"    "kurtosis_picth_arm"      
##  [15] "skewness_pitch_arm"       "kurtosis_roll_arm"       
##  [17] "skewness_roll_arm"        "kurtosis_picth_belt"     
##  [19] "skewness_roll_belt.1"     "kurtosis_yaw_arm"        
##  [21] "skewness_yaw_arm"         "kurtosis_roll_belt"      
##  [23] "skewness_roll_belt"       "max_yaw_belt"            
##  [25] "min_yaw_belt"             "amplitude_yaw_belt"      
##  [27] "kurtosis_roll_dumbbell"   "skewness_roll_dumbbell"  
##  [29] "max_yaw_dumbbell"         "min_yaw_dumbbell"        
##  [31] "amplitude_yaw_dumbbell"   "kurtosis_picth_dumbbell" 
##  [33] "skewness_pitch_dumbbell"  "max_roll_belt"           
##  [35] "max_picth_belt"           "min_roll_belt"           
##  [37] "min_pitch_belt"           "amplitude_roll_belt"     
##  [39] "amplitude_pitch_belt"     "var_total_accel_belt"    
##  [41] "avg_roll_belt"            "stddev_roll_belt"        
##  [43] "var_roll_belt"            "avg_pitch_belt"          
##  [45] "stddev_pitch_belt"        "var_pitch_belt"          
##  [47] "avg_yaw_belt"             "stddev_yaw_belt"         
##  [49] "var_yaw_belt"             "var_accel_arm"           
##  [51] "avg_roll_arm"             "stddev_roll_arm"         
##  [53] "var_roll_arm"             "avg_pitch_arm"           
##  [55] "stddev_pitch_arm"         "var_pitch_arm"           
##  [57] "avg_yaw_arm"              "stddev_yaw_arm"          
##  [59] "var_yaw_arm"              "max_roll_arm"            
##  [61] "max_picth_arm"            "max_yaw_arm"             
##  [63] "min_roll_arm"             "min_pitch_arm"           
##  [65] "min_yaw_arm"              "amplitude_roll_arm"      
##  [67] "amplitude_pitch_arm"      "amplitude_yaw_arm"       
##  [69] "max_roll_dumbbell"        "max_picth_dumbbell"      
##  [71] "min_roll_dumbbell"        "min_pitch_dumbbell"      
##  [73] "amplitude_roll_dumbbell"  "amplitude_pitch_dumbbell"
##  [75] "var_accel_dumbbell"       "avg_roll_dumbbell"       
##  [77] "stddev_roll_dumbbell"     "var_roll_dumbbell"       
##  [79] "avg_pitch_dumbbell"       "stddev_pitch_dumbbell"   
##  [81] "var_pitch_dumbbell"       "avg_yaw_dumbbell"        
##  [83] "stddev_yaw_dumbbell"      "var_yaw_dumbbell"        
##  [85] "max_roll_forearm"         "max_picth_forearm"       
##  [87] "min_roll_forearm"         "min_pitch_forearm"       
##  [89] "amplitude_roll_forearm"   "amplitude_pitch_forearm" 
##  [91] "var_accel_forearm"        "avg_roll_forearm"        
##  [93] "stddev_roll_forearm"      "var_roll_forearm"        
##  [95] "avg_pitch_forearm"        "stddev_pitch_forearm"    
##  [97] "var_pitch_forearm"        "avg_yaw_forearm"         
##  [99] "stddev_yaw_forearm"       "var_yaw_forearm"

Boxplot for each numeric variable per classe: